External review synthesis + v1 release roadmap by shaypal5 · Pull Request #61 · leadforge-dev/leadforge

shaypal5 · 2026-05-05T15:35:15Z

Summary

Lands the Gemini and ChatGPT external-review outputs (two iterations each, six files) targeting the v0.1.0-alpha bundles, plus a structured synthesis under docs/external_review/summaries/.
Adds the integrated v1 release roadmap, design, and acceptance gates under docs/release/.
Adds a hard constraint to CLAUDE.md forbidding public relational tables that reconstruct the label via joins.
Updates .agent-plan.md with the new "Next Up — leadforge-lead-scoring-v1 Curated Dataset Release" section, alpha tracker preserved beneath as historical context.

Critical finding from review

ChatGPT v2 verified locally that student_public bundles reconstruct converted_within_90_days end-to-end through public relational tables: opportunities.close_outcome == "closed_won" plus customers/subscriptions existence (which only exist for converted leads) recover the target with 100% accuracy via joins. This is THE blocker for any public Kaggle/HF release. Phase 2 of the v1 roadmap is the structural fix.

Roadmap shape (7 phases)

Audit and naming — reproduce blocker locally; lock dataset name leadforge-lead-scoring-v1
Snapshot-safe relational export — leadforge/render/relational_snapshot_safe.py + leadforge/validation/relational_leakage.py; bundle schema v4 → v5
Release validation hardening — release_quality.py + leakage_probes.py + reporting.py; calibration / lift / cross-seed bands
Channel-signal audit + dataset card hardening — Gemini's channel-conditional rates partially adopted as a v1 audit (full encoding deferred to post-v1)
Platform packaging — Kaggle dataset-metadata.json + HF README.md with configs + cover image
Notebook sequence (4) + adversarial framing (issue templates, break-me guide)
LLM critique + publish — minimal one-shot critique with Effective Semantic Diversity rubric dimension

Recommendations triage

22 review findings processed in docs/external_review/summaries/recommendations_pass.md with explicit action codes (accept / accept-with-different-approach / defer / out-of-scope-issue). 14 items go to v1; 8 to post-v1 (after-release-roadmap); 2 to v2-track issues. The split is documented item-by-item with rationale.

What's NOT in this PR

Planning artifacts only — no code changes. Implementation begins in follow-up PRs (Phase 1 first).

Doc inventory

docs/external_review/
  gemini/{gemini_report_v1,gemini_report_v2}.md
  chatgpt/{chatgpt_report_v1,chatgpt_report_v2,leadforge_report_v1_critique,leadforge_second_attempt_guidance}.md
  summaries/
    README.md
    {gemini_v1,gemini_v2,chatgpt_v1,chatgpt_v1_critique,chatgpt_guidance,chatgpt_v2}_summary.md
    cross_source_takeaways.md
    key_findings.md
    recommendations_pass.md

docs/release/
  v1_release_roadmap.md       # active source of truth
  v1_release_design.md        # architecture decisions
  v1_acceptance_gates.md      # operational definition of v1-ready
  post_v1_roadmap.md          # deferred items

Test plan

Pre-commit hooks pass (whitespace + EOF fixes auto-applied during first commit; rerun clean on second)
Reviewer reads recommendations_pass.md and confirms triage matches intent
Reviewer reads v1_release_roadmap.md Phase 2 design and confirms snapshot-safe export shape is correct
Reviewer reads v1_acceptance_gates.md and confirms the gate set is the right v1-ready definition
CLAUDE.md hard constraint reads correctly and isn't redundant with existing items

🤖 Generated with Claude Code

Land the Gemini and ChatGPT external-review outputs (two iterations each, six files) targeting the v0.1.0-alpha bundles, plus a structured synthesis under docs/external_review/summaries/ and an integrated v1 release roadmap under docs/release/. Critical finding from review (verified locally by ChatGPT v2): student_public bundles reconstruct converted_within_90_days end-to-end through public relational tables — opportunities.close_outcome plus customers/subscriptions existence recover the target via joins. CLAUDE.md gains a hard constraint forbidding this; the v1 release roadmap's Phase 2 is the structural fix. This PR is planning artifacts only — no code changes. The roadmap defines 7 phases ending with publishing leadforge-lead-scoring-v1 to Kaggle and Hugging Face. Implementation begins in follow-up PRs. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Copilot

Pull request overview

This PR adds planning and reference documentation for moving from the current alpha bundles toward a public v1 dataset release (leadforge-lead-scoring-v1). It lands the raw external review artifacts (Gemini + ChatGPT), a Claude-authored synthesis/triage of those findings, and a phased v1 release roadmap with an explicit “v1-ready” acceptance-gate contract; it also updates project guidance (CLAUDE.md) and the agent execution plan (.agent-plan.md) to reflect the newly identified relational-leakage blocker.

Changes:

Add v1 release planning docs: roadmap, design decisions, and acceptance gates (plus a post-v1 backlog).
Add external review corpus + structured summaries/synthesis/triage under docs/external_review/.
Update constraints and planning trackers (CLAUDE.md, .agent-plan.md) to reflect the relational-leakage finding and v1 workstream.

Reviewed changes

Copilot reviewed 22 out of 24 changed files in this pull request and generated 9 comments.

Show a summary per file

File	Description
docs/release/v1_release_roadmap.md	Defines the 7-phase v1 release plan and operational “v1-ready” criteria.
docs/release/v1_release_design.md	Captures v1-specific architectural decisions (snapshot-safe relational export, release validation/reporting, packaging, critique loop).
docs/release/v1_acceptance_gates.md	Specifies machine-checkable gates/criteria intended to define “ready to publish.”
docs/release/post_v1_roadmap.md	Collects deferred items and longer-horizon improvements after v1 ships.
docs/external_review/summaries/recommendations_pass.md	Triage of 22 findings with action codes and v1 vs post-v1 targeting.
docs/external_review/summaries/README.md	Index of the external review corpus and synthesized outputs.
docs/external_review/summaries/key_findings.md	Action-prioritized synthesis of review findings (critical→low).
docs/external_review/summaries/gemini_v2_summary.md	Summary of Gemini v2 review and unique contributions.
docs/external_review/summaries/gemini_v1_summary.md	Summary of Gemini v1 review and unique contributions.
docs/external_review/summaries/cross_source_takeaways.md	Cross-source theme consolidation and divergence map.
docs/external_review/summaries/chatgpt_v2_summary.md	Summary of ChatGPT v2 forensic review and proposed roadmap shape.
docs/external_review/summaries/chatgpt_v1_summary.md	Summary of ChatGPT v1 (historical / superseded).
docs/external_review/summaries/chatgpt_v1_critique_summary.md	Summary of ChatGPT’s critique of its v1 report (methodology corrections).
docs/external_review/summaries/chatgpt_guidance_summary.md	Summary of the “second attempt guidance” methodology spec.
docs/external_review/gemini/gemini_report_v2.md	Raw Gemini v2 external review artifact.
docs/external_review/gemini/gemini_report_v1.md	Raw Gemini v1 external review artifact.
docs/external_review/gemini/.gitkeep	Keeps the Gemini review directory tracked.
docs/external_review/chatgpt/leadforge_report_v1_critique.md	Raw ChatGPT critique artifact (methodology + corrections).
docs/external_review/chatgpt/chatgpt_report_v2.md	Raw ChatGPT v2 external review artifact (forensic audit + roadmap).
docs/external_review/chatgpt/chatgpt_report_v1.md	Raw ChatGPT v1 external review artifact (superseded).
docs/external_review/chatgpt/.gitkeep	Keeps the ChatGPT review directory tracked.
CLAUDE.md	Adds an explicit hard constraint prohibiting public relational tables that reconstruct the label via joins; links new v1 docs.
.agent-plan.md	Adds “Next Up — `leadforge-lead-scoring-v1`” section and preserves the alpha tracker as historical context.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

The original roadmap stopped at the phase level; reviewer asked for the phase → N PRs → which-files-each-PR-touches decomposition. Adds: - a "PRs" column to the phase summary table - a new "PR breakdown" section with per-phase enumeration of the 14 planned PRs (planning IDs phase.seq, not GitHub PR numbers) - per-PR labels, file lists, sizes, and intra-phase dependencies Total: 14 PRs targeting the v1.1.0 — Curated dataset v1 release milestone. Phase 3 has 3 PRs (the largest phase); Phases 1 and 4 are single-PR each. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…one naming, CSV split contract Six accepted review findings from PR #61 Copilot review (the three "||"-prefix table claims were false positives and resolved as irrelevant): - COPILOT-1 (v1_release_design.md:37): clarify that flat task splits are emitted as Parquet at tasks/<task_id>/{train,valid,test}.parquet in the current alpha contract; CSV exports are a Phase 5 deliverable. Standardize on `valid.csv` (matches existing `valid.parquet`), drop the `validation.csv`/`valid.csv` divergence. - COPILOT-2/3/4: replace fictional `event_timestamp` with the real per-table timestamp columns (`touch_timestamp`, `session_timestamp`, `activity_timestamp`) plus opportunities' `created_at`. Updates v1_release_design.md (the design statement), v1_acceptance_gates.md (gate G4.4), and CLAUDE.md (hard constraint). Makes all three machine-checkable against the actual schema. - COPILOT-8: resolve git-tag inconsistency. v1_release_design.md:25 no longer mentions `dataset/v1`; standardizes on `leadforge-lead-scoring-v1` as both the git tag and Release name. - COPILOT-9: rename GitHub milestone #7 from "v1.1.0 — Curated dataset v1 release" (which read like a semver package version) to "dataset: leadforge-lead-scoring-v1" — explicitly dataset-scoped, no conflict with the package version decoupling principle. Three references in v1_release_roadmap.md updated to match. Three Copilot threads (COPILOT-5/6/7) claimed tables had `||` prefixes; verified false on inspection — tables use standard `|` delimiters. Resolved as not-applicable. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

github-actions · 2026-05-05T16:03:08Z

pr-agent-context report:

No unresolved review comments, failing checks, or actionable patch coverage gaps were found on PR #61 in repository https://github.com/leadforge-dev/leadforge. Treat this PR as all clear unless new signals appear.

Run metadata:

Tool ref: v4
Tool version: 4.0.21
Trigger: commit pushed
Workflow run: 25387522318 attempt 1
Comment timestamp: 2026-05-05T16:02:18.835567+00:00
PR head commit: ec12b88399cf12eeb8519f6d01c5657d838b0417

Copilot

Pull request overview

Copilot reviewed 22 out of 24 changed files in this pull request and generated 5 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+# v1 Lead-Scoring Dataset Release Roadmap
+
+**Target:** Publish `leadforge-lead-scoring-v1` to Kaggle and Hugging Face as a best-in-class educational synthetic CRM dataset family.
+**Source of truth:** This roadmap is derived from `docs/external_review/summaries/recommendations_pass.md` (signed off 2026-05-05).


+| **v1 (first-release-roadmap)** | 14 | #1, #2, #3, #4, #5, #6, #7, #8 (audit), #9, #10, #11 (minimal), #12 (LLM rubric), #14, #15, #16, #19 |
+| **post-v1 (after-release-roadmap)** | 8 | #8 (full encoding), #11 (full CI), #12 (quantitative), #13, #17, #18, #20 (log-normal) |


+prefixed with `TBD` are placeholders set in Phase 3 of the v1 release
+roadmap; a release candidate cannot ship until all `TBD`s are resolved.
+
+This file is the operational definition of done for the v1 release. It is


+
+## Public/instructor diff gate
+
+- **G9.1** Every public/instructor difference is intentional and listed in `release/EXPOSURE_DELTA.md`.


-**v1.0.0 released (2026-05-02).** All milestones (M0–M13) complete. Teaching dataset series (v1–v7) approved by consumer. Package version bumped to 1.0.0 in pyproject.toml and leadforge/version.py.
+**leadforge package v1.0.0 released (2026-05-02).** All framework milestones (M0–M13) complete. Teaching dataset series (v1–v7) approved by consumer. Package version bumped to 1.0.0 in pyproject.toml and leadforge/version.py.
+
+**v0.1.0-alpha datasets shipped (2026-05-05).** Five bundles (intro / intermediate / advanced / intermediate_instructor / tiny_demo) in `release/`, packaged for external review by Gemini + ChatGPT (two iterations each). Synthesis lives under `docs/external_review/summaries/`. **Critical finding:** public relational tables reconstruct the target via joins (verified locally by ChatGPT v2 in `chatgpt_report_v2.md §0`). The v1 release work below addresses this plus the rest of the review consensus.


* docs: PR 1.1 — relational leakage audit on alpha bundles Reproduces the ChatGPT v2 finding on the actual 5000-lead alpha bundles (`release/{intro,intermediate,advanced}/`). All three public tiers reconstruct `converted_within_90_days` at 100% via every probed path; the join-derived LR / HistGBM model also achieves AUC = 1.000 in 5-fold CV. Adds: - `scripts/probe_relational_leakage.py` — five deterministic reconstruction paths (A direct, B opportunity won, C customer exists, D subscription exists, E B∨C∨D) plus a bonus public-relational-only model. The deterministic reconstruction is exposed as a pure function `deterministic_relational_reconstruction(...)` designed to lift verbatim into `leadforge/validation/leakage_probes.py` in PR 3.1. - `tests/scripts/test_probe_relational_leakage.py` — unit test on a hand-built four-lead toy bundle covering each path independently, plus a path-A-collapses-when-label-dropped test that pre-validates Phase 2's fix shape. - `docs/release/v1_current_state_audit.md` — per-tier evidence table, explicit G4.1-G4.6 verdict (all FAIL), pointer to PR 2.1 as the structural fix. Also reaffirms G1.1 (dataset name `leadforge-lead-scoring-v1`, already locked via PR #61's milestone rename) in `v1_acceptance_gates.md`, and checks off Phase 1 in `.agent-plan.md`. This PR is evidence + docs only. The structural fix lives in PR 2.1. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * review: address self-review findings on PR 1.1 Brutal self-review identified 15 issues; this commit addresses the correctness/credibility ones. Summary: Script (scripts/probe_relational_leakage.py): - Assert lead_id uniqueness in deterministic_relational_reconstruction (a validator must refuse non-unique keys, not silently misalign). - Make customers/subscriptions optional inputs — empty frames now accepted, so the same script runs cleanly on Phase-2-fixed bundles. - Branch the agg dict properly — was using has_close_outcome flag *inside* a lambda that pandas would have crashed before invoking. - Add Phase-2-success ablation: re-run deterministic probes on a redacted-in-process view to surface the post-fix shape *now*. - Bonus model gains a without_close_outcome_aggregates variant — the with-aggregates variant is trivially perfect because any_closed_won is Path B aggregated; the without variant is the load-bearing probe. Refactored to sklearn Pipeline (removes string-based scaler dispatch). - Add G4.4 snapshot-window probe directly on event tables. Empirical result: alpha bundles PASS G4.4 (90-day horizon already bounds timestamps). The audit doc now corrects the prior "FAIL (assumed)" framing. - Add --max-accuracy flag for Phase-2 CI gating (exit 2 on threshold violation). - Rewrite docstring to clearly distinguish four orthogonal evidence channels. Tests (tests/scripts/test_probe_relational_leakage.py): - Add _binary_metrics coverage (perfect / all-negative / mixed) — the function had divide-by-zero handling and F1 logic with zero tests. - Add lead_id-uniqueness rejection test. - Add CLI smoke test on release/intermediate (skip if not present). - Add --max-accuracy gate failure test. - Drop the synthetic edge-case lead — toy bundle is now 3 leads cleanly covering A/C without B/D. Audit doc (docs/release/v1_current_state_audit.md): - Distinguish Path A (label in cleartext) from Paths B-E (joins). - Add Phase-2 ablation table showing collapse to (1 - conv_rate). - Report bonus model in two variants; AUC = 1.000 in BOTH variants is a substantive new finding (non-trivial relational features carry the leak independently of close_outcome — i.e., omitting customers/subscriptions tables is structurally required). - Correct G4.4 verdict to PASS (with caveat about within-horizon content still leaking). - Note instructor companion behavior is expected. - Stop reporting AUC on deterministic 0/1 paths. - Remove "sanity floor" framing for Path A (understated the violation). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * review: address Copilot inline comments on PR 1.1 Five real issues from the automated review: 1. Path A (label cleartext read) used .astype(bool), which maps NaN to True. Routed through pandas' nullable "boolean" dtype with explicit fillna(False) so partially-redacted bundles don't report phantom positives. Same fix applied to y_true construction in probe_bundle. 2. _binary_metrics F1 returned NaN when precision OR recall was exactly 0.0 — should be 0.0 (zero-skill is well-defined; NaN should signal undefined). New logic: NaN only when precision/recall is itself NaN; 0.0 when both are zero; harmonic mean otherwise. 3. Module docstring promised exit 1 on "shape errors" but main() only caught FileNotFoundError. Broadened catch to (FileNotFoundError, KeyError, ValueError); error message now includes exception type. 4. probe_bundle() previously bailed with an "error" key when converted_within_90_days was absent. Post-Phase-2 bundles will be exactly that shape. Now reports path_prediction_rates (fraction of leads each path flags positive) unconditionally — useful as a sanity view both before and after the fix. The early-return is now a "note" key rather than "error", with a message pointing the reader at the prediction-rates view. 5. New tests: - F1 zero-precision-and-recall yields zero, not NaN - Path A with NaN-bearing label column maps NaN -> False Total: 10 tests, all passing. Lint/format/mypy clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings May 5, 2026 15:35

shaypal5 added this to the v1.1.0 — Curated dataset v1 release milestone May 5, 2026

shaypal5 added the type: docs Documentation or narrative changes label May 5, 2026

Copilot started reviewing on behalf of shaypal5 May 5, 2026 15:36 View session

This comment has been minimized.

Sign in to view

Copilot AI reviewed May 5, 2026

View reviewed changes

This comment has been minimized.

Sign in to view

Copilot AI review requested due to automatic review settings May 5, 2026 16:01

Copilot started reviewing on behalf of shaypal5 May 5, 2026 16:03 View session

shaypal5 merged commit 79cd3c8 into main May 5, 2026
10 checks passed

shaypal5 deleted the external-review-roadmap branch May 5, 2026 16:06

Copilot AI reviewed May 5, 2026

View reviewed changes

shaypal5 mentioned this pull request May 5, 2026

docs: PR 1.1 — relational leakage audit on alpha bundles #62

Merged

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

External review synthesis + v1 release roadmap#61

External review synthesis + v1 release roadmap#61
shaypal5 merged 3 commits into
mainfrom
external-review-roadmap

shaypal5 commented May 5, 2026

Uh oh!

This comment has been minimized.

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

This comment has been minimized.

github-actions Bot commented May 5, 2026

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		\| v1 (first-release-roadmap) \| 14 \| #1, #2, #3, #4, #5, #6, #7, #8 (audit), #9, #10, #11 (minimal), #12 (LLM rubric), #14, #15, #16, #19 \|
		\| post-v1 (after-release-roadmap) \| 8 \| #8 (full encoding), #11 (full CI), #12 (quantitative), #13, #17, #18, #20 (log-normal) \|


		## Public/instructor diff gate

		- G9.1 Every public/instructor difference is intentional and listed in `release/EXPOSURE_DELTA.md`.

Conversation

shaypal5 commented May 5, 2026

Summary

Critical finding from review

Roadmap shape (7 phases)

Recommendations triage

What's NOT in this PR

Doc inventory

Test plan

Uh oh!

This comment has been minimized.

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

This comment has been minimized.

github-actions Bot commented May 5, 2026

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants